Distributed Information Retrieval With Skewed Database Size Distributions
نویسندگان
چکیده
The proliferation of government information on local area networks and the Internet creates the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval or federated search). A distributed information retrieval system is composed of three components: Resource representation, resource selection and result merging. Previous research suggested that the CORI algorithm is one of the most effective resource selection algorithms, but its effectiveness in environments containing a wide range of database sizes was not studied thoroughly. This paper shows that the CORI algorithm does not work well in environments with a skewed distribution of database sizes. We present a new resource selection algorithm based on estimating the distribution of relevant documents among the online databases. This new algorithm selects resources more accurately than the CORI algorithm, which can lead to improved document rankings.
منابع مشابه
The Effect of Database Size Distribution on Resource Selection Algorithms
Resource selection is an important topic in distributed information retrieval research. It can be a component of a distributed information retrieval task and can also serve as an independent application of database recommendation system together with the resource representation part. There is a large body of valuable prior research on resource selection but very little has studied about the eff...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملIntraspecific Body Size Frequency Distributions of Insects
Although interspecific body size frequency distributions are well documented for many taxa, including the insects, intraspecific body size frequency distributions (IaBSFDs) are more poorly known, and their variation among mass-based and linear estimates of size has not been widely explored. Here we provide IaBSFDs for 16 species of insects based on both mass and linear estimates and large sampl...
متن کاملAn Intelligent Framework For Distributed Query Optimization Of Spatial Data In Geographic Information Systems
The Geographic Information System (GIS) uses the spatial database for its data storing purposes. As the spatial database takes huge space, the size and data retrieval cost of database increases. That’s why we have to use some optimized technique to retrieve the data from the database. Also, we can apply the distributed database concept to the spatial database to achieve better performance. Afte...
متن کاملImpact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases
Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...
متن کامل